Parallel Text Processing Alignment and Use of Translation Corpora
نویسندگان
چکیده
In the past ten to fifteen years considerable progress has been made in the field of parallel text alignment. The term parallel text itself is now well-established within the computational linguistics community. It refers to texts accompanied by their translations in one or several languages. Aligned texts have proved to be an invaluable source of translation data for terminology banks and bilingual dictionaries. Translation alignment is currently providing the basis for the development of a new generation of tools to assist human translators and to improve the quality and productivity of their work.
منابع مشابه
استخراج پیکره موازی از اسناد قابلمقایسه برای بهبود کیفیت ترجمه در سیستمهای ترجمه ماشینی
Data used for training statistical machine translation method are usually prepared from three resources: parallel, non-parallel and comparable text corpora. Parallel corpora are an ideal resource for translation but due to lack of these kinds of texts, non-parallel and comparable corpora are used either for parallel text extraction. Most of existing methods for exploiting comparable corpora loo...
متن کاملJMaxAlign: A Maximum Entropy Parallel Sentence Alignment Tool
Parallel corpora are an extremely useful tool in many natural language processing tasks, particularly statistical machine translation. Parallel corpora for certain language pairs, such as Spanish or French, are widely available, but for many language pairs, such as Bengali and Chinese, it is impossible to find parallel corpora. Several tools have been developed to automatically extract parallel...
متن کاملUsing Parallel Corpora to Create a Greek-English Dictionary with Uplug
This paper presents the construction of a Greek-English bilingual dictionary from parallel corpora that were created manually by collecting documents retrieved from the Internet. The parallel corpora processing was performed by the Uplug word alignment system without the use of language specific information. A sample was extracted from the population of suggested translations and was included i...
متن کاملSemi-Automatic Parallel Corpora Extraction from Comparable News Corpora
The parallel corpus is a necessary resource in many multi/cross lingual natural language processing applications that include Machine Translation and Cross Lingual Information Retreival. Preparation of large scale parallel corpus takes time and also demands the linguistics skill. In the present work, a technique has been developed that extracts parallel corpus between Manipuri, a morphologicall...
متن کاملA new model for persian multi-part words edition based on statistical machine translation
Multi-part words in English language are hyphenated and hyphen is used to separate different parts. Persian language consists of multi-part words as well. Based on Persian morphology, half-space character is needed to separate parts of multi-part words where in many cases people incorrectly use space character instead of half-space character. This common incorrectly use of space leads to some s...
متن کامل